Introduce subassembly offset output artifact #15710

nikola-matic · 2025-01-13T15:25:59Z

Resolves #14827.

remove bytecode string from SubAssembly
implement output for CLI
figure out tests (CLI is not gonna work due to each commit changing the binary hash and thus metadata) - boost?
test via Yul compilation
check whether isCreation is preserved during import
include metadata offset in the structure? [not in this PR]
docs

cameel

There are some bugs, some missing parts and the overall implementation could be done more robustly.

I'm also not sure if the whole design actually accomplishes our goal. Only giving sub locations may not be enough to locate metadata without heuristics because data objects are not placed in separate subs at evmasm level.

See comments below for details.

libevmasm/Assembly.cpp

libevmasm/LinkerObject.h

cameel · 2025-01-17T16:24:08Z

libsolidity/interface/StandardCompiler.cpp

Please add the feature to the CLI too. We should keep them at parity (and it's a pain for testing/development if the only way to access a feature is through StandardJSON).

Also, Yul compilation is not covered. Aside from general inconsistency, this makes two-step compilation less powerful, which may be a problem for future parallelization.

cameel · 2025-01-17T16:48:29Z

libevmasm/Assembly.cpp

This needs to be implemented for EOF as well (i.e. in assembleEOF(), or, if possible, just in assemble() covering both with the same code).

Since we're at the stage where EOF is passing semantic tests (and they will be enabled by default quite soon), we should start requiring all new features for work on EOF as well.

cameel · 2025-01-17T17:17:43Z

libevmasm/Assembly.cpp

+{
+	for (auto& subAssembly: _subAssemblies)
+	{
+		subAssembly.start = _currentBytecodeSize - subAssembly.length;


I'm either missing something or this assumes that all subassemblies overlap and extend to the end of their parent assembly. Does it even work with more than one subassembly? If you have two subassemblies of the same length then you will end up with the same start location for both. And even if they're of different lengths it will be wrong.

EDIT: Yeah, you even have a test showing this overlap (standard_subassembly_offsets/):

{ "isCreation": false, "length": 780, "start": 1007 }, { "isCreation": true, "length": 130, "start": 1657, "subs": [

You should also have some asserts here. At the very least that a subassembly does not stick outside of its parent assembly.

cameel · 2025-01-17T17:57:13Z

test/cmdlineTests/standard_import_ast_select_bytecode/output.json

+                        "subAssemblyOffsets": {
+                            "subs": [
+                                {
+                                    "isCreation": true,
+                                    "length": 130,
+                                    "start": 0,
+                                    "subs": [
+                                        {
+                                            "isCreation": false,
+                                            "length": 104,
+                                            "start": 26
+                                        }
+                                    ]
+                                }
+                            ]
+                        }


Actually, is Sourcify ok with getting only subassembly locations? I've always been thinking about metadata as a separate subassembly myself, because it's separate data Object in Yul, but looking at the PR I remembered that it's actually not the case at evmasm level. Metadata goes into Assembly::m_auxiliaryData and logically becomes a part of each assembly's bytecode, not a separate object. It's simply appended after all the subs and other data. I think that due to this it will still be necessary to use heuristics to fish out its location within the assembly, even knowing location of all subassemblies.

I think we may need to include the start and length of Assembly::m_auxiliaryData separately for each sub (when it's non-empty). For completeness we may want to simply list all the data chunks (that would actually have been nice to have for compiler debugging in some cases).

CC @kuzdogan.

By the way, we should have some tests that include data objects between assemblies.

Also, if we had the exact location of metadata, we could create a more robust, Boost-based test that would get the structure info and the bytecode and diff the CBOR bit. It's not easy to spot a problem just looking at the values in command-line tests and we're not even shown the bytecode.

Sorry I missed this, I'll have a look at this tomorrow

I talked about it later with @nikola-matic and he said that the justification for not adding data locations was that Sourcify has no problem detecting metadata at the top level and it's only the nested contracts that cause problems so adding information about where whole contracts start and end is enough to extend the existing mechanism to cover them.

I still think we should include the exact location though. We do have it and it's very easy to add, I see no reason to force tools to use heuristics to find it.

One thing we agreed on though was that the info about data locations does not have to be a part of this very PR. This will still be a working feature without it and the extra fields can be added on top of it as an extension.

@cameel Isn't it safe to assume the CBOR will be at the end of all of the assemblies that are not creation: true? So we can just look at all of them, get the last two bytes and decode. So it's technically not a heuristic but a rule?

Of course I wouldn't say no to this and it would make our lives easier.

Ah, thanks for pointing that out. I completely forgot that CBOR is not the only thing there and that we actually also add the length ourselves. You're right. With that the check should be reliable, at least when we're talking about the contracts you compile yourself and can be sure that the metadata is supposed to be present.

Ok then, I guess it's not strictly necessary for Sourcify in that case.

I still don't see much downside in providing that information though :)

test/cmdlineTests/standard_no_append_cbor/output.json

test/cmdlineTests/standard_subassembly_offsets/input.json

libsolidity/interface/StandardCompiler.cpp

cameel · 2025-01-17T18:56:26Z

test/cmdlineTests/standard_import_asm_json_immutable_references/output.json

+                        "subAssemblyOffsets": {
+                            "subs": [
+                                {
+                                    "isCreation": false,
+                                    "length": 87,
+                                    "start": 0
+                                }
+                            ]
+                        }


The input file has two assemblies but the output shows only one. Is this a bug in your feature or an instance of #15725 (because the nested assembly is unreferenced)?

cameel · 2025-01-17T19:22:25Z

test/cmdlineTests/standard_import_asm_json_immutable_references/output.json

+                        "subAssemblyOffsets": {
+                            "subs": [
+                                {
+                                    "isCreation": false,


By the way, I wonder why this isn't true. Does exporting and reimporting asm JSON lose the creation status or is it just because of how the artificial input was crafted? If the status gets lost, this would be a bug (and should be reported).

cameel · 2025-01-17T19:25:22Z

test/cmdlineTests/standard_subassembly_offsets/input.json

Oh, we also need some coverage for optimized compilation.

cameel · 2025-01-23T15:48:43Z

Some more thoughts after today's call:

You asked if the top-level assembly is always a creation object - just wanted to note that this is actually irrelevant to this task. Just follow whatever Assembly::isCreation() says.
About the complication with adding info for top-level object: I guess the thing that makes it awkward to implement is partly the fact that you're trying to store info about a complete subhierachy in every LinkerObject. You can do what we agreed on, but just FYI there's also a different way you could go about it:
- In LinkerObject store only info about start and length of its immediate subs. No info about nested objects or the current object.
- Move the calculation of the whole hierarchy to a helper like Assembly::structure() that would traverse the m_subs tree, creating corresponding SubAssembly structs as it goes through it. It has access to all the necessary info in each Assembly and its LinkerObject.
- You'd just have to make sure that CompilerStack/YulStack/EVMAssemblyStack either expose the Assembly or have a method that asks the Assembly they store inside to produce this artifact.

cameel · 2025-02-10T14:50:16Z

Another thing missing here are docs. The new option should at the very least be mentioned in the Standard JSON input description. It would also be good to have a paragraph on the page about Metadata explaining that nested contracts can have metadata as well and that the output of this option (combined with the length marker at the end of metadata) can be used to locate it.

github-actions · 2025-03-15T12:05:24Z

This pull request is stale because it has been open for 14 days with no activity.
It will be closed in 7 days unless the stale label is removed.

matheusaaguiar · 2025-03-31T04:25:15Z

solc/CommandLineInterface.cpp

@@ -137,6 +137,7 @@ static std::string const g_strSrcMapRuntime = "srcmap-runtime";
 static std::string const g_strStorageLayout = "storage-layout";
 static std::string const g_strTransientStorageLayout = "transient-storage-layout";
 static std::string const g_strVersion = "version";
+static std::string const g_strAssemblyStructure = "assembly-structure";


nit: options are in alphabetical order here.

matheusaaguiar · 2025-04-01T15:28:13Z

libevmasm/LinkerObject.h

@@ -89,6 +89,15 @@ struct LinkerObject
 	/// Bytecode offsets of named tags like function entry points.
 	std::map<std::string, FunctionDebugData> functionDebugData;

+	struct Structure {


Maybe AssemblyStructure is better?

nikola-matic requested review from cameel and ekpyron January 13, 2025 15:26

nikola-matic force-pushed the introduce-subassembly-offset-output-artifact branch from d086ad9 to e4f8896 Compare January 17, 2025 10:39

cameel requested changes Jan 17, 2025

View reviewed changes

cameel reviewed Jan 17, 2025

View reviewed changes

test/cmdlineTests/standard_subassembly_offsets/input.json

Copy link

Member

cameel Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, we also need some coverage for optimized compilation.

cameel mentioned this pull request Jan 21, 2025

Expose sub-structure of bytecode #9332

Open

cameel mentioned this pull request Jan 24, 2025

eof: Semantic tests update #15665

Merged

nikola-matic force-pushed the introduce-subassembly-offset-output-artifact branch 4 times, most recently from 939e863 to a624455 Compare February 5, 2025 11:03

nikola-matic force-pushed the introduce-subassembly-offset-output-artifact branch 2 times, most recently from 6726747 to 8733981 Compare February 19, 2025 10:30

nikola-matic force-pushed the introduce-subassembly-offset-output-artifact branch 2 times, most recently from 21e2558 to 1eebef2 Compare February 28, 2025 10:31

ethereum deleted a comment from stackenbotten Feb 28, 2025

nikola-matic force-pushed the introduce-subassembly-offset-output-artifact branch 3 times, most recently from 1af3027 to 218f4d7 Compare February 28, 2025 13:21

github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Mar 15, 2025

nikola-matic removed the stale The issue/PR was marked as stale because it has been open for too long. label Mar 15, 2025

nikola-matic added this to the 0.8.30 milestone Mar 15, 2025

nikola-matic self-assigned this Mar 15, 2025

nikola-matic force-pushed the introduce-subassembly-offset-output-artifact branch from c3c04f6 to 8846e75 Compare March 17, 2025 10:46

nikola-matic marked this pull request as ready for review March 17, 2025 11:51

matheusaaguiar reviewed Apr 1, 2025

View reviewed changes

nikola-matic force-pushed the introduce-subassembly-offset-output-artifact branch from 8846e75 to d940dac Compare June 4, 2025 12:54

nikola-matic added 2 commits June 4, 2025 22:46

Introduce sub-assembly offset output artifact

733612b

Changelog and docs

a6fe7b8

nikola-matic force-pushed the introduce-subassembly-offset-output-artifact branch from d940dac to a6fe7b8 Compare June 4, 2025 20:46

nikola-matic modified the milestones: 0.8.30, 0.8.31 Jun 5, 2025

manuelwedler mentioned this pull request Jul 2, 2025

List of metadata improvements ethereum/sourcify#1523

Open

Introduce subassembly offset output artifact #15710

Are you sure you want to change the base?

Introduce subassembly offset output artifact #15710

Uh oh!

Conversation

nikola-matic commented Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cameel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kuzdogan Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cameel Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cameel commented Jan 23, 2025

Uh oh!

cameel commented Feb 10, 2025

Uh oh!

github-actions bot commented Mar 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nikola-matic commented Jan 13, 2025 •

edited

Loading

cameel left a comment •

edited

Loading

kuzdogan Jan 31, 2025 •

edited

Loading

cameel Jan 17, 2025 •

edited

Loading